Topological inference on brain networks across subtypes of post-stroke aphasia

Persistent homology (PH) characterizes the shape of brain networks through the persistence features. Group comparison of persistence features from brain networks can be challenging as they are inherently heterogeneous. A recent scale-space representation of persistence diagram (PD) through heat diffusion reparameterizes using the finite number of Fourier coefficients with respect to the Laplace-Beltrami (LB) eigenfunction expansion of the domain, which provides a powerful vectorized algebraic representation for group comparisons of PDs. In this study, we advance a transposition-based permutation test for comparing multiple groups of PDs through the heat-diffusion estimates of the PDs. We evaluate the empirical performance of the spectral transposition test in capturing within- and between-group similarity and dissimilarity with respect to statistical variation of topological noise and hole location. We also illustrate how the method extends naturally into a clustering scheme by subtyping individuals with post-stroke aphasia through the PDs of their resting-state functional brain networks.


Introduction
Brain network modeling based on magnetic resonance imaging (MRI) is an effective approach to understand the functions and dysfunctions of the brain.Brain networks have an innate graph structure that have been traditionally studied through graphical or graph arXiv:2311.01625v1[stat.ME] 2 Nov 2023 theoretic models based on single-scale covariance estimation (S.Huang et al., 2010) or singlescale graph-theoretic measures (Rubinov & Sporns, 2010;Sporns, 2002).These models effectively characterize brain network topology and have become the norm for brain network analysis.However, it has recently been noted that single-scale models may not be sufficient in capturing the complexity of brain connectivity and multi-scale models are needed (Betzel & Bassett, 2017).On the other hand, a ubiquitous problem in brain network analysis is selection of threshold on edge weights to reveal significant connections within and between brain regions.Arbitrary threshold may cause problem of bias and consistency across studies (Drakesmith et al., 2015;Garrison, Scheinost, Finn, Shen, & Constable, 2015).A multi-scale approach to brain network modeling has thus become widely adapted through persistent homology (PH), which captures multi-scale features of data through invariant topological structures (Edelsbrunner, Letscher, & Zomorodian, 2002).Using the language of simplicial homology (Hatcher, 2001), PH reveals the underlying topological structures of data by their persistence through a dynamic assortment of points, edges, and triangles.The fact that the overall topological changes hold more significance over fleeting structures in PH makes the algorithm particularly robust under the presence of noise and artifacts, thus revealing more insight on network topology than single-scale measures (Carlsson, 2009).Topological characteristics of the dynamic changes through the PH process are summarized through persistence features.
Current persistence features consist of barcode and persistence diagram (PD), the original descriptors proposed by Edelsbrunner et al. (2002), and persistence landscape (PL) (Bubenik, 2015) and persistence image (PI) (Adams et al., 2017), both of which were developed when the demand increased for incorporating persistence features in statistical inference and machine learning models.Persistence features are inherently heterogeneous for noisy samples, even when the samples come from homogeneous underlying data objects.The heterogeneous nature of persistence features means that statistical inference for group comparison is not straightforward.Parametric inference often requires stringent distributional assumptions, which are rarely met by persistence features.So we utilize a nonparametric inference approach.Permutation testing is a standard nonparametric inference procedure for complex data objects and features without clear distributional properties.It is known as the exact test in statistics since the distribution of the test statistic under the null hypothesis can be exactly computed if we can calculate all the test statistics under every possible permutation.It is thus one of the most widely used inference procedures in neuroimaging studies where the data is typically complex in structure and the underlying distributional properties are difficult to quantify (Nichols & Holmes, 2002;Simpson, Lyday, Hayasaka, Marsh, & Laurienti, 2013;Winkler, Ridgway, Douaud, Nichols, & Smith, 2016).However, generating every possible permutation for brain networks with a large number of nodes is still extremely time consuming even for a modest sample size.Standard permutation testing through approximations only reaches a fraction of the exhaustive list of permutations and is computationally intensive.When the total number of permutations is large, various resampling techniques have been proposed to speed up the computation in the past (Nichols & Holmes, 2002;Winkler et al., 2016).These resampling methods generate a small fraction of possible permutations and the statistical significance is computed approximately.Neuroimaging studies typically generate 5,000-1,000,000 permutations, less than a fraction of all possible permutations.A few approaches have been developed to overcome the computational bottleneck for permutation testing on persistence features.The exact topological inference approach allows for fast permutation of monotone functions built on birth or death times in barcodes with respect to the Komogorov-Smirnov (KS) distance (Chung, Lee, Ombao, & Solo, 2019).This approach has quadratic run time that beats the exponential run time of standard permutation tests and has been extended to compare PLs (Wang, Behroozmand, Phillip Johnson, Bonilha, & Fridriksson, 2021;Wang, Ombao, & Chung, 2019).However, the approach is limited to comparing two features and not applicable for comparing between two sets of features.Another rapid permutation test based on transpositions does not require monotonicity and is applicable for comparing two sets of persistence features (Chung, Xie, et al., 2019;Songdechakraiwut & Chung, 2023).It has allowed us to develop a unified framework for topological inference through heat kernel estimation of PDs.
Inference and learning approaches comparing PDs have been built on confidence band (Fasy et al., 2014) to functional representations (Adams et al., 2017;Bubenik, 2015;Carriére, Oudot, & Ovsjanikov, 2015;Chen, Wang, Rinaldo, & Wasserman, 2015;Chung, Bubenik, & Kim, 2009;Pachauri, Hinrichs, Chung, Johnson, & Singh, 2011;Reininghaus, Huber, Bauer, & Kwitt, 2015), as comparing raw PDs consisting of planar scatter points encoding birth and death times of topological structures often require point matching through, for instance, the Hungarian matching algorithm, which quickly becomes formidable for largescale data.It is also unclear how we may compare two sets of raw PDs.The functional representation approach overcomes the issue of the points on raw PDs having arbitrary locations and provides an effective framework for downstream comparison.In this approach, PDs essentially undergo a smoothing process, in some cases through a scale-space representation from kernels for heat diffusion of Dirac delta functions uniquely representing the points of PD (Reininghaus et al., 2015).However, existing kernel features on PD are typically convoluted, which lacks flexibility when performing resampling-based statistical inference procedures such as permutation testing.A new scale-space representation of PD was recently proposed based on the heat kernel (HK) estimation (Kulkarni, Chung, Bendlin, & Prabhakaran, 2020), where the upper-triangular domain of PDs is represented using a finite number of Fourier coefficients with respect to the Laplace-Beltrami (LB) eigenfunction expansion of the domain.It provides a powerful vectorized algebraic representation for comparisons of PDs at the same coordinates, foregoing the need for matching across PDs due to their arbitrary point locations.Motivated by a topology-preserving spectral permutation test (Wang, Ombao, & Chung, 2018), we developed an inference procedure for comparing two sets of PDs estimated by the new scale-space representation by transposing the PD labels (Wang, Chung, & Fridriksson, 2022).By updating only the terms in an L 2 -distance between the mean HK estimates of two sets of PDs involved in each transposition, computation becomes much faster than standard permutation testing that exchanges an arbitrary number of labels in each iteration.This inference procedure generalizes the method developed by Wang et al. (2018) for comparing persistence features of single-trial univariate signals, where the resampling takes place at the signal level and thus cannot be directly applied to images and networks.The inference framework now resamples at the feature level, which allows us to compare PDs of images and networks.We have also extended it to a new topological ANOVA (T-ANOVA) approach to compare across multiple groups of PDs without dimensionality reduction, as well as a topological clustering scheme in application.
In this study, we establish a topological inference framework through stability of HK estimation on PDs.We evaluate the empirical performance of the spectral permutation test and T-ANOVA in simulation studies in detecting heterogeneous topological noise and hole location across multiple images.We also apply the methods to study topological difference in brain networks across subtypes of individuals with post-stroke aphasia.

Methods
Brain networks are typically modeled as a weighted graph, with the edge weights given by a similarity measure between the measurements on the nodes of the network (Bassett, 2006;Bien & Tibshirani, 2011).Suppose we have a network represented by the weighted graph G = (V, w) with the node set V = {1, . . ., p} and unique positive undirected edge weights w = (w ij ) constructed from a similarity measure such as Pearson's correlation.We define the binary network G ϵ = (V, w ϵ ) as a subgraph of G consisting of the node set V and the binary edge weights w ϵ defined by As we increase ϵ, which we call the filtration value, more edges are included in the binary network G ϵ and so the size of the edge set increases.Since edges connected in the network do not get disconnected again, we observe a sequence of nested subgraphs for any This sequence of nested subgraphs make up a Rips filtration where two nodes with a weight w ij smaller than ϵ are connected, and the birth and death of homological features in the form of clusters of nodes and holes formed by more than 3 edges are tracked through the filtration (Lee, Chung, Kang, Kim, & Lee, 2011;Lee, Chung, Kang, & Lee, 2014).We pair the birth and death times of clusters and holes as the coordinates of scatter points on a planar graph {(a i , b i )} L i=1 in the persistence diagram (PD).The persistence of clusters and holes is measured by the drop from their corresponding points to the y = x line on the PD.Long persistence indicates that the corresponding cluster or hole is more likely to be an underlying feature in the network.As an illustration in Figure 1, we see how a point that corresponds to a hole in a key shape stands out with high persistence in the PD from the Rips filtration constructed on a 100-point point cloud sampled from a key shape with a hole.

Heat kernel representation of persistence diagram
Since PDs do not form a vector space, they do not possess a natural statistical framework (Bubenik, 2015) and requires additional manipulation such as kernel smoothing for downstream statistical analysis.As with all noisy data, smoothing is needed for reducing noise (typically random, often artifactual) to better reveal the underlying data structure.We Left Three: The evolving 1-skeleton of a 100-point point cloud sampled from a key shape with a hole.Right: PD from the Rips filtration constructed on the 1-skeletons of the point cloud.The point in the PD that corresponds to the key hole stands out with high persistence -much further away from the diagonal (y = x) line than the rest of the points.
could either smooth data used to construct the networks or smooth persistence descriptors such as PD.In principle, topological noise and artifacts should be better removed with the latter approach as persistence descriptors are designed to capture topological structures, be they inherent or transient.Another important reason for smoothing PDs is that the heterogenous nature of raw PDs makes it difficult to perform various algebraic operations for statistical inference.Various smoothing methods have been applied to PDs such that statistical inference can be directly performed on them.Beginning with the work of Chung et al. (2009), each PD is discretized using the the uniform square grid and a concentration map is then obtained by counting the number of points in each pixel, which is equivalent to smoothing PD with a uniform kernel.This approach is somewhat similar to the voxel-based morphometry (Ashburner & Friston, 2000), where brain tissue density maps are used as a shapeless metric for characterizing concentration of the amount of tissue.Pachauri et al. (2011) followed up the approach by smoothing the PD by a Gaussian kernel centered at every point.Later, Bubenik (2015) proposed the persistence descriptor PL by representing the PD as a function in the Banach space L p (R 2 ) aimed at statistical analysis.PL is easily invertible to a PD, but overemphasizes the high-persistence features.To account for the overall pattern of persistence features, a persistence scale-space (PSS) kernel approach was then proposed by Reininghaus et al. (2015), where the points in PD are treated as heat sources modeled as Dirac-delta functions and used as an initial condition for a heat diffusion problem with a Dirichlet boundary condition on the diagonal.The closed-form solution of the diffusion problem is an L 2 (Ω) function obtained by convolving the initial condition with a Gaussian kernel, with Ω = {(x, y) ∈ R 2 : y ≥ x} being the closed half plane above the diagonal line y = x, and the feature map from the PDs to L 2 (Ω) at a fixed scale yields the PSS kernel.The Hilbert space structure of L 2 (R 2 ) can be used to construct a PL kernel similar to PSS (Reininghaus et al., 2015).The relatively new persistence descriptor PI sampled at discrete uniform grid to produce homogenous vectorized data out of PDs (Adams et al., 2017).PIs live in Euclidean space and are therefore amenable to a broader range of learning techniques than PLs (Adams et al., 2017).A new heat kernel representation for PDs has recently been proposed by Kulkarni et al. (2020), which not only allows a non-convoluted vectorized representation for comparisons at the same coordinates of PDs but also smoothing PD at different scales.It has also been combined with transposition test, a novel permutation testing approach, for fast inference on PDs (Wang et al., 2022).We provide in the next two sections a detailed description of heat kernel representation and transposition test on PDs.
Heat kernel representation has been established as a smoothing framework for noisy measurements on a general manifold M ⊂ R d (Chung, Dalton, Shen, Evans, & Davidson, 2007;Chung et al., 2014).We assume the fundamental stochastic model where f is the noisy measurement, h is the unknown signal, and ε is a zero-mean Gaussian random field.We make the general enough assumptions that f ∈ L 2 (M), the space of square integrable functions on M with the inner product where µ is the Lebesgue measure.A self-adjoint operator , induces orthonormal eigenvalues λ k and eigenfunctions ψ k on M: where, without loss of generality, we can sort the eigenvalues λ k such that and the eigenfunctions ψ k can be numerically computed by solving a generalized eigenvalue problem.Then, by Mercer's Theorem, any symmetric positive definite kernel can be written as Now consider the diffusion-like Cauchy problem with the initial condition h(σ = 0, p) = f (p).The partial differential equation ( 6) diffuses the noisy data h(p) over σ.For the self-adjoint operator L, (6) has the unique solution (Chung et al., 2007) which provides an estimate ĥσ (p) of the unknown signal h(p).The bandwidth σ controls the amount of smoothing in the estimate; as σ increases, ĥσ (p) becomes smoother.When L is the Laplace-Beltrami (LB) operator, the diffusion equation ( 6) becomes the isotropic heat diffusion equation and the kernel (5) becomes the heat kernel (HK) where the ψ k are the eigenfunctions of the LB operator ∆ satisfying for p ∈ M. The HK framework has been shown to be equivalent to kernel regression and wavelet (Chung et al., 2014).
To construct a HK representation of PD, we restrict the domain of diffusion to M = T = {(x, y) ∈ R 2 : y > x}, i.e. the upper triangular region above the diagonal line y = x where the scatter points of the PD D = {(a i , b i )} P i=1 are located.We constrain T within a certain range, such as standardizing the coordinates of the PD, so that T is bounded.Consider heat diffusion equation with the initial condition where The scatter points in the PD serve as the heat sources of the diffusion process.To simplify notation, we will refer to any series h(σ, p) as h σ (p) as the bandwidth σ is fixed.A unique solution to ( 9) is given by the HK expansion where is the HK with respect to the eigenfunctions ψ k of the LB operator ∆ satisfying ∆ψ k (p) = λ k ψ k (p) for p ∈ T , and are the Fourier coefficients with respect to the the LB eigenfunctions.The first eigenvalue λ 0 = 0 of the LB operator corresponds to eigenfunction , where µ(T ) is the area of the triangular region T and σ is the bandwidth of the HK.
The HK expansion (10) provides a vectorized representation of the PD D so that we can compare across PDs at the same coordinates.In practice, we include sufficiently large κ terms to approximate the HK expansion: which we refer to as the degree-κ HK estimate of the given PD.When σ = 0, we can completely recover the initial scatter points.As σ → ∞, it is essentially smoothing the PD with a uniform kernel on T .Figure 2 shows the HK smoothing of a PD with respect to the bandwidths σ = 0, 0.1, 1, 10.Note that the Fourier coefficients f k remain the same for all k when constructing the HK expansion at different diffusion scale σ.
As a distance measure for the HK-estimated PDs, we use the L 2 -distance between the functions h 1 , h 2 ∈ L 2 (T ) defined as where the h 1 k and h 2 k , k = 0, . . ., ∞, are the respective Fourier coefficients of h 1 and h 2 as defined in ( 12) with respect to the LB eigenfunctions.
In the standard kernel setup, we have the feature map where L 2 (T ) is the space of square integrable functions on T with the L 2 -distance between the functions g 1 , g 2 ∈ L 2 (T ) defined as where the g 1 k and g 2 k , k = 0, . . ., ∞, are the respective Fourier coefficients of g 1 and g 2 as defined in (12) with respect to the LB eigenfunctions as defined in (10) for a PD D ∈ D. This feature map corresponds to the kernel an explicit form of which is given by ( 11): We can show stability of the heat kernel as follows: The integral version of Jensen's inequality is for convex function ϕ (Matkowski & Nikodem, 1994).Following Jensen's inequality, We used the fact heat kernel K σ (p, q) is a probability distribution such that showing HK smoothing on PD is a contraction map (Chung, Wang, & Wu, 2018).Letting g = g 1 − g 2 , we have the stability results.The HK smoothing reduces the topological variability in PD.We use a simple example with each of two PDs containing one of the two points (−λ, λ) and (−λ + 1, λ + 1) (Reininghaus et al., 2015), as an illustration of the stability of the kernel smoothing procedures.When comparing two PDs, the L 2 -distance induced by the HK does not weigh over any points in the PDs, as the distance between the two points is , which remains constant as λ → ∞.In contrast, the PL-induced kernel distance is dominated by variations in the points of high persistence in the PDs, as the distance between the two points grows in the order of √ λ and is unbounded, whereas the Wasserstein distance and PSS-induced kernel distance do not over emphasize the high-persistence points, as the distance between the two points asymptotically approach a constant as λ → ∞ (Reininghaus et al., 2015).While the PSS kernel representation, like our HK representation of PD, also uses an exact solution to the heat diffusion problem with the original PD as the initial condition (Figure 3), the implicit form of the solution is difficult to manipulate for cost-effective resampling-based statistical inference.It is likewise difficult to manipulate the Wasserstein distance and PL-induced distance for the same purpose.

Permutation test on HK-estimated PDs
Existing kernel features on PD have been shown theoretically and empirically to work well with machine learning frameworks (Adams et al., 2017;Reininghaus et al., 2015) but are typically convoluted, which lacks flexibility when performing resampling-based statistical inference procedures such as permutation testing.Our past studies have shown powerful applications of the series representation of the heat diffusion problem, such as comparing the persistence features of brain signals through built permutation test based on HK estimates of signals (Wang et al., 2018), where we studied how topology of signals is preserved by permuting Fourier coefficients of sine and cosine basis functions.The approach provides a ground for permutation testing based on spectral components.The downside, however, is the computational load, with spectral permutation of single-trial signals requiring hours on end to converge.
Here, we use the HK for PD smoothing and subsequent statistical inference based on the HK-estimated PDs.Once we have the HK estimates of PD, we can use them as the basis for statistical inference.Suppose we want to permute the elements of two ordered sets with sizes m and n in a permutation test with the test statistic f (x, y).Under the null hypothesis, we assume exchangeability of x and y.Each permutation is an unrestricted rearrangement of the combined ordered set z = (x 1 , . . ., x m , y 1 , . . ., y n ) and we denote all possible (m + n)! permutations S m+n , which is a symmetric groups of order m + n.The standard approximate permutation test typically used in practice is built on on uniform sampling from the full set of permutations.The required number of permutations for convergence increases exponentially as the sample sizes increase.Even with sample sizes like m = n = 20, the random permutation test requires significant computational resources if we compute the test statistic for each exchange of group labels.
A transposition is defined as a permutation π ij that exchanges the i-th and j-th An example of a PD and PSS-and HK-estimated versions.
elements between x and y while keeping all the other elements fixed, i.e.
Any permutation in S m+n can be reached by a sequence of transpositions (Chung, Xie, et al., 2019).The random transposition is a random walk related to card shuffling problems and it is a special case of walk in symmetric groups (Aldous, 1983;Aldous & Diaconis, 1986).The walk between elements within x or y is also allowed but will not affect the computation a symmetric test functions.Instead of performing uniform random sampling in S m+n , we can perform a sequence of random walks and compute the test statistic at each walk.Consider walks in the two sample setting.We will determine how test statistic changes over each walk.Over random walk or transposition π ij , the statistic changes from ) directly, we can compute it from L(x, y) incrementally in constant run time by updating the value of L(x, y).If L is an algebraic function that only involves addition, subtraction, multiplication, division, integer exponents, there must exists a function M such that where the computational complexity of g is constant (Chung, Xie, et al., 2019) When we compare two groups of PDs with sample sizes m and n, we assume under the null hypothesis that the functional means of the HK expansion of PDs are the same for both groups, for a fixed bandwidth σ > 0. The Fourier coefficients in the HK expansion of population PDs in the two groups are unknown.We estimate them with the HK expansion of sample PDs {f i } and {g j } from the groups approximated by their degree-κ estimates: between the functional means as a test statistic for measuring the group difference in HK expansion of the PDs.We can algebraically show that f − ḡ In a standard approximate permutation test, the subject labels of the two groups are randomly exchanged.Here, we build the permutation test on transposition π ij that only exchanges the i-th and j-th subject labels between {f i , i = 1, . . ., m} and {g j , j = 1, . . ., n} and keeps all the other PDs fixed, i.e.
which we call a spectral transposition.Any permutation of the two groups of m and n subjects is reachable by a sequence of transpositions, which has been shown to be computationally much more efficient than the standard permutation testing procedure of exchanging all labels at once (Chung, Xie, et al., 2019).We generate the empirical distribution for the permutation test through the spetral transpositions.In one spectral transposition π ij , we obtain the L 2 -distance between the functional means of the degree-κ HK estimates of PDs based on transposed labels: where are the means of transposed Fourier coefficients.Since we know fk and ḡk already, we simply update the terms 1 m (g affected by the transposition.The pvalue of the spectral permutation test is then calculated as the proportion of L 2 -distances in the empirical distribution exceeding the L 2 -distance between the observed PDs.To ensure convergence, we perform upward of 100,000 permutations until the p-value stabilizes.

Topological analysis of variance via transpositions on HK-estimated PDs
Topological analysis of variance allows us to assess within-and between-group similarity and dissimilarity in PDs across multiple groups.The challenge of applying an ANOVA procedure to raw PDs is that they do not have unique means (Mileyko, Mukherjee, & Harer, 2011).Thus, Heo, Gamble, and Kim (2012) applied the standard ANOVA procedure to raw PDs reduced in dimensionality via Isomap.In contrast, our HK-estimates of PDs have well-defined functional means and L 2 -distance through Fourier coefficients, which provides a natural framework for topological analysis of variance on PDs without any dimensionality reduction beforehand.
To describe our heuristics in constructing an effective topological ANOVA framework, suppose the K groups of HK-estimated PDs are expressed as follows: Group 1 : Motivated by the standard ANOVA procedure, we could try and build an F -statistic comparing K groups of HK-estimated PDs through the L 2 -distance in (26).A topological between-group sum of squares could take the form of and a topological within-group sum of squares the form of where f ij is the HK-estimate of the j-th PD of the i-th group, f i is the functional mean of the HK-estimates of PDs in the i-th group, and f is the grand functional mean over the HK-estimates of all PDs.The functional means would serve as the topological centroids.
Ideally the F -statistic would follow F -distribution under some mild normality assumptions on the HK-estimated PDs, such as However, normality assumptions for heterogeneous features like PDs may be too strong to satisfy on multivariate data.Instead of fiddling with parametric constraints, we use a permutational ANOVA approach that bypasses the distributional issue and has found significant applications on multivariate data in response to complex experimental designs of ecological studies, where variables usually consist of counts of counts, percentage cover, frequencies, or biomass for a large number of species, and many other fields including chemistry, social sciences, agriculture, medicine, genetics, psychology, economics (Anderson, 2001(Anderson, , 2017)).Here we build our test statistic for the permutational ANOVA based on pre-calculated pairwise distances between PDs so that no recalculation of distances is required after each transposition.We will only need to update the within-and between-group sums of distances after each transposition.We will refer to our topological ANOVA procedure as T-ANOVA, where we define the topological between-group sum of squares (TSSB) and topological within-group sum of squares (TSSW) based on sums of pairwise L 2 -distances: We measure the between-and within-group disparity with the ratio statistic In each transposition, we randomly sample the group labels i 1 and i 2 out of the K groups with respect to the proportions of the group sizes n i /N .We then uniformly sample the subject labels j 1 and j 2 out of the i 1 -th and i 2 -th group respectively for transposition.We can prove by induction that any permutation between the groups can be reached by a sequence of transpositions through Theorem 1 in (Chung, Xie, et al., 2019) showing any permutation between two groups can be reached by a sequence of transpositions.
In a transposition, we only update the pairwise L 2 -distances in TSSB and TSSW affected by the transposition: where we adjust terms involving only groups i 1 and i 2 with ( 35) and (36).
where we adjust terms involving only groups i 1 and i 2 with (37) and ( 38), terms involving groups other than i 2 that are affected by i 1 with (39), and terms involving groups other than i 1 that are affected by i 2 with (40).The ratio statistic is then updated to The p-value of the T-ANOVA test is then calculated as the proportion of ϕ ′ in the empirical distribution exceeding the ϕ between the observed PDs.We keep the transposed labels as the current labels on which we build the next transposition and randomize all labels every 500 transpositions to improve convergence rate.

Performance Evaluation
We conduct two sets of simulation studies to evaluate performance of the two-sample transposition test and T-ANOVA.

Performance of two-sample transposition test
We investigate how the spectral transposition test detects underlying topological similarity and dissimilarity at the presence of topological noise and artifact.

Power of detecting hole in structure
We evaluate the power of the transposition test in detecting a key shape with a distinct hole (Figure 4  Left: We randomly sample 100 points from the image with an innate shape of a key.Top Right Row: Underlying key shape and possible locations of topological noise in the form of a small hole.Bottom Right Row: Variants of the key shape in Group 2. They could appear in the 4 pre-specified forms or randomly out of the variants.
of the first group are generated randomly from the part of the rectangular image, whereas the 100 points in each point cloud of the second group are generated randomly with a varied percentage (90%, 95%, 100%) of points from the shape of the key.Rips filtration is constructed on each point cloud.The proposed spectral permutation test is then applied to compare the PDs of the Rips filtrations in the two groups.When there are respectively 90%, 95%, and 100% points sampled from the shape of the key in the second group, the spectral permutation test rejects (p-value < 0.05) the null hypothesis of no group difference in 91, 100, and 100% of 100 simulations (corresponding means ± standard deviations of pvalues: 0.0124±0.0327,0.0041±0.0125,0.0008±0.0057,showing that the test stays sensitive in detecting the group shape difference when points in the second group are not entirely sampled from the shape of the key.

Robustness of performance under variation of topological noise and hole location
We conduct two studies to assess the robustness of the test when the underlying topological structure is 'contaminated' with heterogeneous topological noise and when the underlying structure undergoes non-topological changes.
We first evaluate the robustness of performance under heterogeneity of topological noise.In each of 100 simulations, we use the spectral transposition test to compare Group  1 Summary of mean±standard deviation of p-values from the spectral transposition test in 100 simulations.Top half: In each simulation, the test is used to compare a group of m random samples with a varied percentage (90%, 95%, 100%) of 100 points from the original key shape with a group of n random samples with the same percentage of 100 points from the key shape with topological noise in the form of a much smaller hole next to the keyhole.The location of the smaller hole in each random sample of the second group can be pre-specified or randomly chosen from the pre-specified options.Bottom half: In each simulation, the test is used to compare a group of m random samples with a varied percentage (90%, 95%, 100%) of 100 points from the original key shape with only the top left quarter of the keyhole left, with a group of n random samples with the same percentage of 100 points from the original key shape with a random quarter of the keyhole left.

Robustness under Variation of Topological Noise
1 of m random samples with a varied percentage (90%, 95%, 100%) of 100 points from the original key shape with Group 2 of n random samples from the key shape 'contaminated' with topological noise in the form of a much smaller hole next to the keyhole with prespecified (in such case m = 4 vs n = 4) or random locations (in such case m = 5 vs n = 5, m = 20 vs n = 20, or m = 100 vs n = 100).Figure 4 (top right row) shows the 4 possible locations of the topological noise in Group 2. We expect the test to stay robust to this topological noise.Table 1 summarizes the results for different percentage of points when the topological noise appears at pre-specified vs random locations.The spectral transposition test stays robust to the topological noise in fixed and random locations.
We then evaluate the robustness of performance under variation of hole location.In each of 100 simulations, we use the spectral transposition test to compare Group 1 of m random samples with a varied percentage (90%, 95%, 100%) of 100 points from the original key shape with only the top left quarter of the keyhole left, with Group 2 of n random samples with the same percentage of 100 points from the original key shape with a prespecified (in such case m = 4 vs n = 4) or random (in such case m = 5 vs n = 5, m = 20 vs n = 20, or m = 100 vs n = 100) quarter of the keyhole left.Figure 4 (bottom right half) shows the 4 possible variants of the keyhole in Group 2. We expect the test to stay robust to this change in structure, which is not topological in nature.Table 1 summarizes the results for different percentage of points when the variants appears at pre-specified vs random locations.The spectral transposition test stays robust to the structural variants in fixed and random locations.

Computational time
The computational time of the spectral transposition test grows steadily as the group sample sizes grow.The mean time for each simulation run for m = 4, 5 vs n = 4, 5 between 7 and 10 seconds and standard deviation within 3 seconds.For m = 20 vs n = 20, the mean time for each simulation run is between 8 and 10 seconds and standard deviation within 3 seconds.For m = 100 vs n = 100, the mean time for each simulation run is between 9 and 11 seconds and standard deviation within 3 seconds.

Performance of T-ANOVA
In each of the simulation studies in this section, we test the performance of the T-ANOVA in comparing three groups of point clouds simulated under different settings.The performance is compared against the standard PERMANOVA test (Anderson, 2001), as well as the topological analysis of variance test proposed by Heo et al. (2012) that runs the univariate ANOVA on dimensionality-reduced PDs by Isomap.

Sensitivity in detecting differential hole presence among multiple groups
In each simulation, three groups of n 1 , n 2 , n 3 100-point point clouds are generated, where the 100 points in each point cloud of the first two groups are generated randomly from the part of the rectangular image, whereas the 100 points in each point cloud of the third group are generated randomly with a varied percentage (90%, 95%, 100%) of points from the shape of the key (Figure 5).Table 2 shows the results of the T-ANOVA test in comparison with the other tests.

Robustness under variation of noise and hole location
We conduct two studies to assess the robustness of the test when the underlying topological structure is 'contaminated' with heterogeneous topological noise and when the hole location shifts as in Section .
We first evaluate the robustness of T-ANOVA under heterogeneity of topological noise.In each of 100 simulations, Group 1, 2, 3 of respective n 1 , n 2 , n 3 random samples are generated with a pre-specified percentage (90%, 95%, 100%) of 100 points from the original key shape 'contaminated' with topological noise in the form of a much smaller hole next to the keyhole with pre-specified (in such case n 1 = n 2 = n 3 = 4) or random locations (in such case n 1 = n 2 = n 3 = 5, n 1 = n 2 = n 3 = 20, or n 1 = 5, n 2 = 20, n 3 = 100).The 4 possible locations of the topological noise in each group are the same as Figure 4. We use the T-ANOVA test to compare PDs across the three groups.We expect the test to stay robust to the topological noise.Table 3 (top half) shows the results of the T-ANOVA test in comparison with standard PERMANOVA and the topological ANOVA proposed by Heo et al. (2012).
We then evaluate the robustness of T-ANOVA under variation of hole location.In each of 100 simulations, Group 1, 2, 3 of respective n 1 , n 2 , n 3 random samples are generated with a pre-specified percentage (90%, 95%, 100%) of 100 points from the original key shape with a pre-specified (in such case n 1 = n 2 = n 3 = 4) or random (in such case n 1 = n 2 = n 3 = 5, n 1 = n 2 = n 3 = 20, or n 1 = 5, n 2 = 20, n 3 = 100) quarter of the keyhole left.The 4 possible variants of the key shape in each group are the same as Figure 4 .We use the T-ANOVA test to compare PDs across the three groups.We expect the test to stay robust to this change in structure, which is not topological in nature.Table 3 (bottom half) shows the results of the T-ANOVA test in comparison with the other tests.

Computational time
Table 4 shows the means and standard deviations of computational times for one million transpositions under the topological noise setting (the hole location and sensitivity studies have similar computational times, so we only present one setting here).Just like the two-sample test, T-ANOVA shows steady growth of computational time as group sample sizes increase, in comparison with the sharp time growth of PERMANOVA.Heo's ANOVA is fast as it runs a univariate ANOVA on the dimensionality-reduced PDs.

Summary
The results show that the performance of our T-ANOVA test is comparable with the two baseline methods in terms of robustness under variation of topological noise and hole location, as well as sensitivity in detecting differential hole presence among multiple groups.In comparison with PERMANOVA, the advantage of the transposition approach of T-ANOVA shows up in the steady growth of computational time as group sample sizes increase.Although T-ANOVA is comparable in performance as Heo's ANOVA, it does not require dimensionality reduction of PDs.More importantly, it has a natural framework for distance-based clustering, which we illustrate in the Application section.

Application
Stroke is the leading cause of severe adult disability in the United States (Tsao et al., 2022).A left-hemisphere stroke commonly leads to aphasia, a speech-language disorder often classified into subtypes according to behavioral symptoms.Traditional subtypes of aphasia are determined through the Aphasia Quotient (AQ) subtest scores of the Revised Western Aphasia Battery (WAB-R) (Kertesz, 2007) that assess speech and language abilities such as spontaneous speech fluency, auditory comprehension, repetition, and naming performance.These scores binarize the patients into categories.For instance, the sponta- 0.0000 ± 0.0000 0.0000 ± 0.0000 0.0000 ± 0.0000 95% 0.0000 ± 0.0000 0.0000 ± 0.0000 0.0000 ± 0.0000 90% 0.0000 ± 0.0000 0.0000 ± 0.0001 0.0000 ± 0.0000 Summary of mean±standard deviation of p-values from the T-ANOVA, Heo's ANOVA, and PERMANOVA in 100 simulations.In each simulation, three groups of n 1 , n 2 , n 3 100-point point clouds are generated, where the 100 points in each point cloud of the first two groups are generated randomly from the part of the rectangular image, whereas the 100 points in each point cloud of the third group are generated randomly with a varied percentage (90%, 95%, 100%) of points from the shape of the key.neous speech fluency score (≥ 5 vs. ≤ 4) is a rating based on subjective evaluation mostly about quantity and grammaticality of output along with other features, such as word-finding difficulty, paraphasias, and hesitations.It separates individuals into fluent and non-fluent categories.Eight traditional subtypes thus arise from the binarized categories of fluency, comprehension, and repetition (Figure 6).Studies over the years have addressed WAB subtyping issues since its initial version in 1982 and proposed new ways identifying coherent clusters of aphasia subtypes (Crary, Wertz, & Deal, 1992;Ferro & Kertesz, 1987;Fromm, Greenhouse, Pudil, Shi, & MacWhinney, 2022;John et al., 2017).Unsupervised learning approaches such as K-means clustering has been applied to behavioral scores beyond WAB-R to redefine aphasia subtypes (Fromm et al., 2022).There is, however, a lack of exploration on aphasia subtyping via clustering brain network features.Our goal is aimed at identifying patterns of damage in the brain networks that lead to overlapping behavioral deficits.This study takes a topological angle at the clustering and inference of the resting-state functional brain networks of aphasic individuals, and summarizing basic statistical characteristics of the WAB-R AQ subtest scores of the clusters.

Data acquisition and preprocessing
The rs-fMRI data were acquired from 103 participants with aphasia resulting from a single ischemic or hemorrhagic stroke involving the left hemisphere on a Siemens Prisma 3T scanner with a 20-channel head coil located at the Center for the Study of Aphasia Recovery at the University of South Carolina.

Figure 6
Traditional aphasia subtypes according to binary categories of fluency, comprehension, and repetition.
size, and a 72-degree flip angle, 50 axial slices (2 mm thick with 20% gap yielding 2.4 mm between slice centers), repetition time TR =1650 ms, TE=35 ms, GRAPPA=2, 44 reference lines, interleaved ascending slice order.During the scanning process, the participants were instructed to stay still with eyes closed.A total of 370 volumes were acquired.
The preprocessing procedures of the rs-fMRI data include motion correction, brain extraction and time correction using a novel method developed for stroke patients (Yourganov, Fridriksson, Stark, & Rorden, 2018).The Realign and Unwarp procedure in SPM12 with default settings was used for motion correction.Brain extraction was then performed using the SPM12 script pm_brain_mask with default settings.Slice time correction was also done using SPM12.The mean fMRI volume for each participant was then aligned to the corresponding T2-weighted image to compute the spatial transformation between the data and the lesion mask.The fMRI data were then spatially smoothed with a Gaussian kernel with FWHM= 6 mm.To eliminate artifacts driven by lesions, a pipeline proposed by Yourganov et al. (2018) was applied on the the rs-fMRI.The FSL MELODIC package was used to decompose the data into independent components (ICs) and to compute the Z-scored spatial maps for the ICs.The spatial maps were thresholded at p < 0.05 and compared with the lesion mask for the participant.The Jaccard index, computed as the ratio between the numbers of voxels in the intersection and union, was used to quantify the amount of spatial overlap between the lesion mask and thresholded IC maps, both of which were binary.ICs corresponding to Jaccard index greater than 5% were deemed significantly overlapping with the lesion mask and then regressed out of the fMRI data using the fsl_regfilt script from the FSL package.By applying the automated anatomical labelling (AAL) atlas, 116 regions of interest (ROIs) were created and used as nodes in the brain networks subsequently constructed.
The Aphasia Quotient, a score strongly related to the overall lesion damage in brain, was measured in the participants.In terms of behavioral measures, the following WAB-R subscores were used to measure performance of participants in fluency, comprehension, repetition, object naming, and sentence completion: Information Content, Fluency Rating, Spontaneous Speech Rating, Comprehension Yes/No Questions, Comprehension Auditory Words, Comprehension Sequential Commands, Comprehension Subscore, Repetition Subscore, Object Naming, Word Fluency, Sentence Completion, Responsive Speech, and Naming Subscore.

Resting-state functional brain network and filtration
We first constructed resting-state functional brain networks from the rs-fMRI described above.The 116 AAL ROIs served as the nodes of the resting-state functional network of each individual and Pearson's correlation between the BOLD signals at two ROIs serve as their edge weight.A Rips filtration was built on the resting-state functional correlation matrix of each individual.The PDs decoding the birth and death times of 1-cycles in the individual Rip filtrations were then smoothed with the HK representation.

Aphasia subtyping via topological clustering of brain networks
Topological clustering has been applied in different angles to studies of resting-state functional brain networks (Chung et al., 2023;Stolz, Harrington, & Porter, 2017).To the best of our knowledge, this is the first study to explore aphasia subtyping through topological clustering of resting-state functional brain networks.Here we take advantage of the HK representation of PDs and extend the T-ANOVA into a topological clustering scheme where clusters were identified with respect to topological centroids calculated as the functional means of the HK estimates of the PDs representing the brain networks of individuals in the study.We compared the statistical characteristics of the topological clusters to baseline clusters obtained through K-means clustering of the WAB-R subscores.We repeated the clustering process 100 times in each instance and checked for consistency across repetitions.Three topological clusters had the overall best fit so we compared the results of three baseline clusters with them.
The overall lesion map and average absolute connectivity of three baseline and topological clusters are shown in Figure 7.The lesion map was created by augmenting stroke lesion damage in the brain of all subjects within each cluster.Note that the three baseline clusters appear to be confounded by the overall lesion extent of the subjects as they show distinctly different lesion extent (Cluster 1 > Cluster 2 > Cluster 3).This is confirmed by the AQ and subscore distributions summarized in Figure 8, where the AQ score is known to positively correlate with lesion extent and the subscore distributions show a distinct monotone pattern consistent with that of AQ across clusters.On the other hand, the topological clusters do not appear to be confounded by lesion extent as the lesion extent do not vary significantly across the clusters and the subscore distributions do not follow a specific trend with reference to the AQ score.As of the average connectivity, we see different connectivity patterns in the three topological clusters, whereas the baseline clusters show similar connectivity pattens.To confirm that the topological clusters did capture significant statistical difference in brain networks, we also compared the brain networks across different clusters through the T-ANOVA on their HK-estimated PDs. Figure 9 shows the empirical distribution of the ratio statistic based on L 2 -distances of HK-estimated 1-dimensional PDs within and between the three clusters over 1 million transpositions.The observed value of the ratio statistic was 5.4728, yielding a p-value of 0 and the conclusion of significant topological difference between the one-dimensional hole presence in the three clusters of brain networks.Now, using the three topological clusters as a basis for exploring aphasia subtypes, behavioral measures in the form of WAB-R subscores across the three clusters/subtypes have pattern of median and interquartile range summarized in Table 5.In terms of the three categories (fluency, comprehension, repetition) used for traditional aphasia subtyping, the comprehension subscores (Comprehension Yes/No Questions, Comprehension Auditory Words, Comprehension Sequential Commands, Comprehension Subscore), Repetition Subscore, and Word Fluency show an overall pattern low-medium-high in medians across Cluster 1, 2 & 3, with the exception of Comp.Yes/No Qs which sees some leveling off in Cluster 2 & 3, whereas Fluency Rating shows a low-high-medium pattern across the three clusters.The low, medium, and high are all in comparison to the median over all subjects.

Discussion
In this study, we established a topological inference framework based on HK representation of PDs.Although it does not require the PDs to be extracted from a specific type of data, we centered the application of the methods around group comparison of PDs The lesion map (left two columns) and average absolute connectivity (right two columns) of three topological and baseline clusters.
from brain networks.But simulating brain networks with holes is not straightforward, so we used point clouds from images with an underlying shape in illustration and simulation studies.We also extended the framework to topological clustering of brain networks with application to subtyping individuals with post-stroke aphasia.
Methodologically, the topological inference framework filled a few gaps left from our Topological Clusters

Baseline Clusters
Box plots of subscores for all participants and those in each of the three clusters.
previous works.As we pointed out in Section , the spectral transposition test generalizes the permutation test proposed by Wang et al. (2018) that compares single-trial signals  & ADNI, 2021).In future studies, the proposed spectral permutation method can be easily adapted for deep learning where the input is augmented persistence features reconstructed from resampled HK coefficients of PDs.
Since the analytical paradigm proposed in the methods section was already complicated, we featured topological clustering only in application.In the application of topological clustering and inference to subtyping individuals with post-stroke aphasia, one would argue that three clusters may be too sparse for actual clinical interpretation even though three clusters were empirically determined to have the best fit.Future studies can refine the approach by exploring more clusters, e.g.matching the number of traditional subtypes to see if they have any consistency.sions on the early versions of the manuscript.Funding sources: NIH R01DC017162 and R01DC01716202S1 (PI: RHD).Author contributions: conceptualization (YW, RHD), statistical analysis (YW, JY), interpretation of results (YW, RHD), writing and editing (All).Compliance with ethical standards: The neuroimaging scans were approved by the Institutional Review Board (IRB) at the University of South Carolina.5 Pattern of median and interquartile range of WAB-R subscores across three topological clusters/subtypes.

Figure 2
Figure 2 where f i k and g j k , k = 0, . . ., κ, are the Fourier coefficients with respect to the k-th LB eigenfunction ψ k .Their functional means are f mean Fourier coefficients.We then use the L 2 -norm difference f − ḡ 2 2 Figure 4

Figure 9
Figure 9 The following imaging parameters of images were used: a multiband sequence (x2) with a 216 × 216 mm field of view, a 90 × 90 matrix Conflict of interest and disclosure: None.Pre-specified Location: n 1 = n 2 = n 3 = 4 Percentage of Points in Key Shape Summary of mean±standard deviation of computational time in seconds for a million transpositions by the T-ANOVA and PERMANOVA in 100 simulations under the topological noise setting.